智能论文笔记

Efficient Planar Pose Estimation via UWB Measurements

Haodong Jiang , Wentao Wang , Yuan Shen , Xinghan Li , Xiaoqiang Ren , Junfeng Wu

分类：机器人

2022-09-14

国家估计是自主系统的重要组成部分。已显示整合超宽带（UWB）技术可以纠正长期估计漂移并绕过环路闭合检测的复杂性。但是，机器人技术中很少有作品采用UWB作为独立的状态估计技术。这项工作的主要目的是仅使用UWB范围测量结果研究平面姿势估计，并研究估计器的统计效率。我们证明了两步方案的出色属性，该方案说，我们可以通过高斯 - 纽顿迭代的一步来完善一致的估计器在渐近上有效。基于此结果，我们设计了GN-uls估计器，并通过模拟和收集的数据集进行评估。GN-uls在我们的静态数据集上达到毫米和次级水平的准确性，并在我们的动态数据集中达到厘米和学位水平的精度，从而提出了仅将UWB用于实时状态估计的可能性。

translated by 谷歌翻译

Object Scan Context: Object-centric Spatial Descriptor for Place Recognition within 3D Point Cloud Map

Haodong Yuan , Yudong Zhang , Shengyin Fan , Xue Li , Jian Wang

分类：计算机视觉

2022-06-07

位置识别技术赋予了一种大满贯算法，具有消除累积错误并自身重新定位的能力。基于点云的位置识别的现有方法通常利用以激光雷达为中心的全局描述符的匹配。这些方法具有以下两个主要缺陷：当两个点云之间的距离很远时，不能执行位置识别，并且只能计算旋转角度，而无需在x和y方向上偏移。为了解决这两个问题，我们提出了一个新颖的全球描述符，该描述符围绕主要对象构建，以这种方式，描述符不再依赖于观察位置。我们分析了该方法可以完美地解决上述两个问题的理论，并在Kitti和一些极端情况下进行了许多实验，这表明我们的方法比传统方法具有明显的优势。

translated by 谷歌翻译

Persia: A Hybrid System Scaling Deep Learning Based Recommenders up to 100 Trillion Parameters

Xiangru Lian , Binhang Yuan , Xuefeng Zhu , Yulong Wang , Yongjun He , Honghuan Wu , Lei Sun , Haodong Lyu , Chengjun Liu , Xing Dong

分类：机器学习

2021-11-10

基于深度学习的模型占主导地位的生产推荐系统的当前景观。此外，近年来目睹了模型规模的指数增长 - 从谷歌的2016年模型，最新的Facebook的型号有10亿个参数，具有12万亿参数。型号容量的每次跳跃都有显着的质量增强，这使我们相信100万亿参数的时代即将来临。然而，即使在工业规模数据中心内，这些模型的培训也在挑战。这种困难是从训练计算的惊人的异质性继承 - 模型的嵌入层可以包括总模型尺寸的99.99％，这是极其内存密集的;虽然其余的神经网络越来越多地计算密集型。为支持培训此类巨大模式，迫切需要有效的分布式培训系统。在本文中，我们通过仔细共同设计优化算法和分布式系统架构来解决这一挑战。具体而言，为了确保培训效率和训练精度，我们设计一种新型混合训练算法，其中嵌入层和密集的神经网络由不同的同步机制处理;然后，我们构建一个名为Persia的系统（短暂的并行推荐培训系统，其中包含混合加速），以支持这种混合培训算法。理论上的示范和实证研究均达到100万亿参数，以证明了波斯的系统设计和实施。我们将Pensia公开使用（在https://github.com/persiamml/persia），以便任何人都能够以100万亿参数的规模轻松培训推荐模型。

translated by 谷歌翻译

RELIANT: Fair Knowledge Distillation for Graph Neural Networks

Yushun Dong , Binchi Zhang , Yiling Yuan , Na Zou , Qi Wang , Jundong Li

分类：机器学习

2023-01-03

Graph Neural Networks (GNNs) have shown satisfying performance on various graph learning tasks. To achieve better fitting capability, most GNNs are with a large number of parameters, which makes these GNNs computationally expensive. Therefore, it is difficult to deploy them onto edge devices with scarce computational resources, e.g., mobile phones and wearable smart devices. Knowledge Distillation (KD) is a common solution to compress GNNs, where a light-weighted model (i.e., the student model) is encouraged to mimic the behavior of a computationally expensive GNN (i.e., the teacher GNN model). Nevertheless, most existing GNN-based KD methods lack fairness consideration. As a consequence, the student model usually inherits and even exaggerates the bias from the teacher GNN. To handle such a problem, we take initial steps towards fair knowledge distillation for GNNs. Specifically, we first formulate a novel problem of fair knowledge distillation for GNN-based teacher-student frameworks. Then we propose a principled framework named RELIANT to mitigate the bias exhibited by the student model. Notably, the design of RELIANT is decoupled from any specific teacher and student model structures, and thus can be easily adapted to various GNN-based KD frameworks. We perform extensive experiments on multiple real-world datasets, which corroborates that RELIANT achieves less biased GNN knowledge distillation while maintaining high prediction utility.

translated by 谷歌翻译

High-Quality Supersampling via Mask-reinforced Deep Learning for Real-time Rendering

Hongliang Yuan , Boyu Zhang , Mingyan Zhu , Ligang Liu , Jue Wang

分类：计算机视觉

2023-01-03

To generate high quality rendering images for real time applications, it is often to trace only a few samples-per-pixel (spp) at a lower resolution and then supersample to the high resolution. Based on the observation that the rendered pixels at a low resolution are typically highly aliased, we present a novel method for neural supersampling based on ray tracing 1/4-spp samples at the high resolution. Our key insight is that the ray-traced samples at the target resolution are accurate and reliable, which makes the supersampling an interpolation problem. We present a mask-reinforced neural network to reconstruct and interpolate high-quality image sequences. First, a novel temporal accumulation network is introduced to compute the correlation between current and previous features to significantly improve their temporal stability. Then a reconstruct network based on a multi-scale U-Net with skip connections is adopted for reconstruction and generation of the desired high-resolution image. Experimental results and comparisons have shown that our proposed method can generate higher quality results of supersampling, without increasing the total number of ray-tracing samples, over current state-of-the-art methods.

translated by 谷歌翻译

PanopticPartFormer++: A Unified and Decoupled View for Panoptic Part Segmentation

Xiangtai Li , Shilin Xu , Yibo Yang , Haobo Yuan , Guangliang Cheng , Yunhai Tong , Zhouchen Lin , Dacheng Tao

分类：计算机视觉

2023-01-03

Panoptic Part Segmentation (PPS) unifies panoptic segmentation and part segmentation into one task. Previous works utilize separated approaches to handle thing, stuff, and part predictions without shared computation and task association. We aim to unify these tasks at the architectural level, designing the first end-to-end unified framework named Panoptic-PartFormer. Moreover, we find the previous metric PartPQ biases to PQ. To handle both issues, we make the following contributions: Firstly, we design a meta-architecture that decouples part feature and things/stuff feature, respectively. We model things, stuff, and parts as object queries and directly learn to optimize all three forms of prediction as a unified mask prediction and classification problem. We term our model as Panoptic-PartFormer. Secondly, we propose a new metric Part-Whole Quality (PWQ) to better measure such task from both pixel-region and part-whole perspectives. It can also decouple the error for part segmentation and panoptic segmentation. Thirdly, inspired by Mask2Former, based on our meta-architecture, we propose Panoptic-PartFormer++ and design a new part-whole cross attention scheme to further boost part segmentation qualities. We design a new part-whole interaction method using masked cross attention. Finally, the extensive ablation studies and analysis demonstrate the effectiveness of both Panoptic-PartFormer and Panoptic-PartFormer++. Compared with previous Panoptic-PartFormer, our Panoptic-PartFormer++ achieves 2% PartPQ and 3% PWQ improvements on the Cityscapes PPS dataset and 5% PartPQ on the Pascal Context PPS dataset. On both datasets, Panoptic-PartFormer++ achieves new state-of-the-art results with a significant cost drop of 70% on GFlops and 50% on parameters. Our models can serve as a strong baseline and aid future research in PPS. Code will be available.

translated by 谷歌翻译

CLIP-Driven Universal Model for Organ Segmentation and Tumor Detection

Jie Liu , Yixiao Zhang , Jie-Neng Chen , Junfei Xiao , Yongyi Lu , Bennett A. Landman , Yixuan Yuan , Alan Yuille , Yucheng Tang , Zongwei Zhou

分类：计算机视觉 | 机器学习

2023-01-02

An increasing number of public datasets have shown a marked clinical impact on assessing anatomical structures. However, each of the datasets is small, partially labeled, and rarely investigates severe tumor subjects. Moreover, current models are limited to segmenting specific organs/tumors, which can not be extended to novel domains and classes. To tackle these limitations, we introduce embedding learned from Contrastive Language-Image Pre-training (CLIP) to segmentation models, dubbed the CLIP-Driven Universal Model. The Universal Model can better segment 25 organs and 6 types of tumors by exploiting the semantic relationship between abdominal structures. The model is developed from an assembly of 14 datasets with 3,410 CT scans and evaluated on 6,162 external CT scans from 3 datasets. We rank first on the public leaderboard of the Medical Segmentation Decathlon (MSD) and achieve the state-of-the-art results on Beyond The Cranial Vault (BTCV). Compared with dataset-specific models, the Universal Model is computationally more efficient (6x faster), generalizes better to CT scans from varying sites, and shows stronger transfer learning performance on novel tasks. The design of CLIP embedding enables the Universal Model to be easily extended to new classes without catastrophically forgetting the previously learned classes.

translated by 谷歌翻译

A Concept Knowledge Graph for User Next Intent Prediction at Alipay

Yacheng He , Qianghuai Jia , Lin Yuan , Ruopeng Li , Yixin Ou , Ningyu Zhang

分类：自然语言处理 | 人工智能 | 机器学习

2023-01-02

This paper illustrates the technologies of user next intent prediction with a concept knowledge graph. The system has been deployed on the Web at Alipay, serving more than 100 million daily active users. Specifically, we propose AlipayKG to explicitly characterize user intent, which is an offline concept knowledge graph in the Life-Service domain modeling the historical behaviors of users, the rich content interacted by users and the relations between them. We further introduce a Transformer-based model which integrates expert rules from the knowledge graph to infer the online user's next intent. Experimental results demonstrate that the proposed system can effectively enhance the performance of the downstream tasks while retaining explainability.

translated by 谷歌翻译

EvidenceCap: Towards trustworthy medical image segmentation via evidential identity cap

Ke Zou , Xuedong Yuan , Xiaojing Shen , Yidi Chen , Meng Wang , Rick Siow Mong Goh , Yong Liu , Huazhu Fu

分类：计算机视觉

2023-01-01

Medical image segmentation (MIS) is essential for supporting disease diagnosis and treatment effect assessment. Despite considerable advances in artificial intelligence (AI) for MIS, clinicians remain skeptical of its utility, maintaining low confidence in such black box systems, with this problem being exacerbated by low generalization for out-of-distribution (OOD) data. To move towards effective clinical utilization, we propose a foundation model named EvidenceCap, which makes the box transparent in a quantifiable way by uncertainty estimation. EvidenceCap not only makes AI visible in regions of uncertainty and OOD data, but also enhances the reliability, robustness, and computational efficiency of MIS. Uncertainty is modeled explicitly through subjective logic theory to gather strong evidence from features. We show the effectiveness of EvidenceCap in three segmentation datasets and apply it to the clinic. Our work sheds light on clinical safe applications and explainable AI, and can contribute towards trustworthiness in the medical domain.

translated by 谷歌翻译

Depression Diagnosis and Analysis via Multimodal Multi-order Factor Fusion

Chengbo Yuan , Qianhui Xu , Yong Luo

分类：人工智能 | 计算机视觉

2022-12-31

Depression is a leading cause of death worldwide, and the diagnosis of depression is nontrivial. Multimodal learning is a popular solution for automatic diagnosis of depression, and the existing works suffer two main drawbacks: 1) the high-order interactions between different modalities can not be well exploited; and 2) interpretability of the models are weak. To remedy these drawbacks, we propose a multimodal multi-order factor fusion (MMFF) method. Our method can well exploit the high-order interactions between different modalities by extracting and assembling modality factors under the guide of a shared latent proxy. We conduct extensive experiments on two recent and popular datasets, E-DAIC-WOZ and CMDC, and the results show that our method achieve significantly better performance compared with other existing approaches. Besides, by analyzing the process of factor assembly, our model can intuitively show the contribution of each factor. This helps us understand the fusion mechanism.

translated by 谷歌翻译